Intel/Qwen3-14B-int4-AutoRound-inc

Model Details

This model is an int4 model with group_size 128 and symmetric quantization of Qwen/Qwen3-14B generated by intel/auto-round.

How To Use

INT4 Inference(CPU/CUDA/INTEL GPU)

from transformers import AutoModelForCausalLM,AutoTokenizer
quantized_model_dir = "Intel/Qwen3-14B-int4-AutoRound-inc"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,  ##change this to align with the official usage
    do_sample=False  ##change this to align with the official usage
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)
##INT4:
# thinking content: <think>
# Okay, the user wants a short introduction to large language models. Let me start by defining what they are. I should mention that they're AI systems trained on vast amounts of text data. Maybe explain their capabilities, like understanding and generating human-like text. I need to cover different applications, such as answering questions, writing stories, coding, etc. Also, it's important to note their training process, using deep learning techniques like transformers. I should mention their ability to handle multiple languages and adapt to various tasks. But I should keep it concise, so avoid too much technical jargon. Maybe end with their impact on technology and industries. Let me check if I'm missing anything. Oh, maybe mention some examples like GPT, BERT, or other models. But since the user asked for a short intro, maybe just refer to them as examples without going into detail. Alright, that should cover the basics without being too lengthy.
# </think>
# content: A **large language model (LLM)** is an advanced artificial intelligence system trained on vast amounts of text data to understand and generate human-like text. These models use deep learning techniques, such as transformer architectures, to process and analyze language patterns, enabling them to perform tasks like answering questions, writing stories, coding, summarizing text, and more. LLMs excel at handling multiple languages, adapting to diverse contexts, and producing coherent, context-aware responses. They power applications ranging from chatbots and virtual assistants to content creation tools and research aids, revolutionizing how humans interact with technology. Examples include models like GPT, BERT, and others developed by companies like OpenAI, Google, and Meta.

##BF16:
# thinking content: <think>
# Okay, the user wants a short introduction to large language models. Let me start by defining what they are. I should mention that they're AI models trained on vast amounts of text data. Maybe explain their capabilities, like generating text, answering questions, and understanding context.
# 
# I need to highlight their applications, such as in chatbots, content creation, and research. Also, it's important to note their strengths, like handling multiple languages and complex tasks. But I shouldn't forget to mention the challenges, like data bias and computational needs. Keep it concise but informative. Make sure the language is clear and not too technical. Let me check if I covered the key points without getting too detailed. Alright, that should work.
# </think>
# content: A **large language model (LLM)** is an advanced artificial intelligence system trained on vast amounts of text data to understand and generate human-like language. These models excel at tasks like answering questions, writing text, translating languages, coding, and more, by recognizing patterns and context in the input they receive. Powered by deep learning and massive computational resources, LLMs like GPT, BERT, or LLaMA can process and produce coherent, contextually relevant responses across diverse domains. They are widely used in chatbots, content creation, research, and automation, though challenges like bias, data privacy, and ethical use remain critical areas of focus.


prompt = "9.11和9.8哪个数字大"
##INT4:
# thinking content: <think>
# 嗯，用户问的是9.11和9.8哪个数字大。首先，我需要确认这两个数字的数值大小。看起来都是小数，但可能有不同的位数。首先，我应该比较它们的整数部分。9.11的整数部分是9，而9.8的整数部分也是9，所以整数部分相同，需要比较小数部分。
# 接下来，比较小数部分。9.11的小数部分是0.11，而9.8的小数部分是0.8。这时候，可能需要将它们转换成相同的位数来比较，或者直接比较数值。0.8等于0.80，而0.11显然比0.80小，所以9.8应该更大。
# 不过，用户可能对小数点后的位数有疑问，比如是否需要补零。比如，9.8可以看作9.80，这样比较的话，0.80确实比0.11大。所以结论是9.8更大。
# 不过，也有可能用户有其他考虑，比如是否在特定的上下文中，比如日期或者其他含义，但问题中没有提到，所以应该按数值比较。另外，检查是否有输入错误，比如9.11是否是9.11还是其他形式，但看起来是正确的。
# 再想一下，有没有可能用户混淆了小数点后的位数，比如9.11是两位小数，而9.8是一位，但比较时应该按数值大小，而不是位数。所以，不管位数多少，只要数值更大就行。0.8比0.11大，所以9.8更大。
# 确认一下，9.8等于9.80，而9.11是9.11，所以9.80确实比9.11大。因此，答案应该是9.8更大。
# </think>
# content: 9.8 比 9.11 大。
# **比较过程：**
# 1. **整数部分**：两者整数部分均为 **9**，相同。
# 2. **小数部分**：
#   - 9.11 的小数部分为 **0.11**。
#   - 9.8 的小数部分为 **0.8**（可视为 **0.80** 以统一位数）。
# 3. **数值比较**：  
#   $0.

##BF16:
# content: <think>
# 嗯，用户问的是9.11和9.8哪个数字大。首先，我需要确认用户的问题是否正确，因为这两个数字看起来像是小数，但可能有不同的含义。比如，9.11可能是指9月11日，而9.8可能是指9月8日，或者它们都是小数，比如9.11和9.80。不过，用户的问题里没有提到日期，所以可能只是单纯的数值比较。
# 首先，我应该先假设用户是想比较两个小数的大小。这时候，我需要回忆一下小数比较的方法。比较小数时，首先比较整数部分，如果整数部分相同，再比较小数部分。这里两个数的整数部分都是9，所以接下来比较小数部分。
# 9.11的小数部分是0.11，而9.8的小数部分是0.8。这时候，我需要比较0.11和0.8的大小。通常，比较小数时，可以将它们转换成同位数，比如0.11和0.80，这样比较的话，0.80显然比0.11大，所以9.8比9.11大。
# 不过，也有可能用户有其他的意图，比如9.11指的是9月11日，而9.8指的是9月8日，这时候比较日期的话，9月11日显然比9月8日晚，所以9.11更大。但用户的问题里没有提到日期，所以可能还是数值比较更合理。
# 不过，也有可能用户输入有误，比如9.11和9.8是否都是小数，或者有没有其他符号。比如，9.11可能是一个版本号，而9.8是另一个版本，这时候可能需要看具体的上下文。但如果没有更多上下文，应该按照数值比较来处理。
# 再检查一下，如果用户输入的是9.11和9.8，那么作为小数的话，9.8等于9.80，而9.11是9.11，所以9.80比9.11大。因此，答案应该是9.8更大。
# 不过，也有可能用户有拼写错误，比如9.11是否是9.11，而9.8是否是9.8，或者有没有其他数字。但根据问题中的写法，应该是正确的。


prompt = "How many r in word strawberry"
##INT4:
# thinking content: <think>
# Okay, let's see. The user is asking how many times the letter 'r' appears in the word "strawberry". Hmm, first I need to make sure I have the correct spelling of the word. Strawberry is spelled S-T-R-A-W-B-E-R-R-Y. Let me write that out to visualize it better: S, T, R, A, W, B, E, R, R, Y.
# Now, I need to count each occurrence of the letter 'r'. Let's go through each letter one by one. The first letter is S, not an R. The second is T, also not. The third is R, that's one. Then A, W, B, E – none of those are R. The next letter is R again, that's two. Then another R, that's three. The last letter is Y. So in total, there are three R's in "strawberry".
# Wait, let me double-check to make sure I didn't miss any. Let's count again: S (1), T (2), R (3), A (4), W (5), B (6), E (7), R (8), R (9), Y (10). So positions 3, 8, and 9 are R's. That's three times. Yeah, that seems right. I don't think I made a mistake here. The word has three R's.
# </think>
# content: The word **"strawberry"** contains **3** instances of the letter **'r'**. 
# **Breakdown:**  
# S-T-**R**-A-W-B-E-**R**-**R**-Y  
# Positions 3, 8, and 9 are 'r's.  
# **Answer:** 3.

##BF16:
# thinking content: <think>
# Okay, let's see. The user is asking how many times the letter "r" appears in the word "strawberry." Hmm, first I need to make sure I have the correct spelling of the word. Strawberry is spelled S-T-R-A-W-B-E-R-R-Y. Let me write that out to visualize it better: S, T, R, A, W, B, E, R, R, Y.
# Now, I need to count each occurrence of the letter "r." Let's go through each letter one by one. Starting with the first letter, S – that's not an R. Next is T – also not an R. The third letter is R, so that's one. Then A, W, B, E – none of those are R. The next letter is R again, that's the second one. Then another R right after that, making it the third R. Finally, Y at the end. 
# Wait, let me check again to make sure I didn't miss any. The letters are S, T, R, A, W, B, E, R, R, Y. So positions 3, 8, and 9 are R's. That's three R's in total. But wait, sometimes people might miscount if they skip a letter or double-count. Let me count once more: first R is the third letter, then after E comes R, then another R. Yes, that's three. 
# I think that's correct. So the answer should be 3. But just to be absolutely sure, maybe I can write out the word and underline each R. S-T-R-A-W-B-E-R-R-Y. Underlining the R's: the third letter, then the eighth and ninth. Yep, three R's. Okay, I'm confident now.
# </think>
# content: The word **"strawberry"** contains **3** instances of the letter **"r"**. 
# **Breakdown:**  
# S-T-**R**-A-W-B-E-**R**-**R**-Y  
# Positions 3, 8, and 9 are all "r".  
# **Answer:** 3.

prompt = "请简短介绍一下阿里巴巴公司"
##INT4:
thinking content: <think>
# 好的，用户让我简短介绍一下阿里巴巴公司。首先，我需要确定用户的需求是什么。可能他们需要一个快速了解阿里巴巴的概述，比如成立时间、主要业务、创始人等基本信息。
# 接下来，我要确保信息准确。阿里巴巴成立于1999年，由马云和他的团队创立，总部在杭州。主要业务包括电商平台（淘宝、天猫、京东）、云计算（阿里云）、数字支付（支付宝）、物流（菜鸟网络）等。这些都是关键点，不能遗漏。
# 然后，用户可能想知道阿里巴巴的影响力。作为全球最大的电商平台之一，阿里云在云计算领域排名靠前，支付宝改变了中国的支付方式，这些都能体现其重要性。另外，提到国际化战略，比如Lazada、阿里国际站，显示其全球布局。
# 还要注意用户可能的深层需求。他们可能是在做报告、写文章，或者只是好奇。所以信息要简洁明了，结构清晰，分点列出可能更好，但用户要求简短，所以需要整合成一段。
# 需要避免使用专业术语，保持口语化，但因为是公司介绍，可能需要稍微正式一点。同时，检查是否有最新的数据，比如成立时间是否正确，是否还有其他重要业务需要提及，比如蚂蚁集团，但可能用户只需要主要部分。
# 最后，确保没有错误，比如成立时间、创始人名字、总部地点等。确认后，组织语言，保持简短，覆盖关键点，不超过两段，每段不要太长。
# </think>
# content: 阿里巴巴集团（Alibaba Group）成立于1999年，总部位于中国杭州，由马云及其团队创立。公司最初以B2B电商平台起家，现已成为全球领先的互联网科技企业，业务涵盖电子商务（淘宝、天猫、京东）、云计算（阿里云）、数字支付（支付宝）、物流（菜鸟网络）、金融科技（蚂蚁集团）及国际化业务（如Lazada、阿里国际站）等。阿里巴巴通过技术创新和生态布局，深刻影响了全球数字经济的发展。

##BF16:
# thinking content: <think>
# 嗯，用户让我简短介绍一下阿里巴巴公司。首先，我需要确定用户的需求是什么。可能他们只是想了解阿里巴巴的基本信息，或者是在做作业、写报告，或者只是好奇。不管怎样，用户需要的是简洁明了的回答，不能太冗长。
# 接下来，我得回忆一下阿里巴巴的基本信息。阿里巴巴集团成立于1999年，总部在杭州，对吧？创始人是马云，不过现在可能已经卸任了。阿里巴巴主要业务包括电商平台，比如淘宝、天猫，还有菜鸟网络、支付宝、阿里云这些子公司。可能还需要提到它的全球影响力，比如在电子商务、云计算、金融科技方面的成就。
# 用户可能还想知道阿里巴巴的规模，比如员工数量、市值，或者它在行业中的地位。不过用户要求简短，所以这些细节可能需要简化。另外，可能需要提到它的使命，比如让天下没有难做的生意，或者推动数字化转型。
# 还要注意避免错误信息，比如马云是否还在任。我记得他已经在2019年卸任董事局主席，现在是董事会成员。另外，阿里巴巴的上市时间是2014年，港交所和纽交所都有。可能需要提到这些关键点。
# 用户可能没有说出来的深层需求是希望了解阿里巴巴的核心业务和影响力，而不仅仅是表面信息。所以需要突出其作为全球最大的电商平台之一，以及在云计算和金融科技领域的领先地位。同时，可能还要提到其社会责任，比如环保、公益项目等。
# 需要确保回答结构清晰，分点列出，但用户要求简短，所以可能用一两句话概括。比如成立时间、总部、创始人、主要业务、全球地位等。还要注意用词准确，比如“集团”而不是“公司”，因为阿里巴巴有多个子公司。
# 最后检查有没有遗漏的重要信息，比如是否提到支付宝和阿里云，这些是阿里巴巴的重要组成部分。同时，可能需要提到其对中国经济和全球电商的影响，但保持简洁。确保没有错误，比如成立年份、上市地点等。这样用户就能得到一个全面但简短的介绍。
# </think>
# content: 阿里巴巴集团（Alibaba Group）成立于1999年，总部位于中国杭州，由马云创立。它是全球领先的互联网科技公司，核心业务涵盖电子商务（淘宝、天猫）、云计算（阿里云）、金融科技（支付宝）、物流（菜鸟网络）及创新业务（如盒马鲜生、阿里健康等）。阿里巴巴致力于通过数字化技术赋能企业与消费者，推动全球商业变革，旗下拥有

Evaluate the model

pip3 install lm-eval

auto-round-eval --model "Intel/Qwen3-14B-int4-AutoRound-inc" --eval_bs 16  --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,cmmlu,ceval-valid

Metric	BF16	INT4(best)	INT4(default)
Avg	0.6491	0.6484	0.6467
arc_easy	0.8409	0.8367	0.8396
arc_challenge	0.5845	0.5845	0.5776
boolq	0.8933	0.8917	0.8954
ceval-valid	0.8210	0.8217	0.8098
cmmlu	0.8020	0.7951	0.7942
gsm8k 5 shots	0.8832	0.8908	0.8863
hellaswag	0.6095	0.6035	0.6030
lambada_openai	0.6773	0.6788	0.6761
leaderboard_mmlu_pro 5 shots	0.5322	0.5281	0.5289
leaderboard_ifeval inst_level_strict_acc	0.4173	0.4245	0.4269
leaderboard_ifeval prompt_level_strict_acc	0.2717	0.2699	0.2736
mmlu	0.7714	0.7671	0.7671
openbookqa	0.3500	0.3440	0.3420
piqa	0.7992	0.7960	0.7971
truthfulqa_mc1	0.4027	0.4064	0.4027
winogrande	0.7285	0.7348	0.7269

Generate the model

Here is the sample command to generate the model.

auto-round-best \
--model Qwen/Qwen3-14B \
--device 0 \
--group_size 128 \
--bits 4 \
--format 'auto_round' \
--output_dir "./tmp_autoround"

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

Intel
/

Qwen3-14B-int4-AutoRound-inc